46 research outputs found

    CNM: An Interpretable Complex-valued Network for Matching

    Full text link
    This paper seeks to model human language by the mathematical framework of quantum physics. With the well-designed mathematical formulations in quantum physics, this framework unifies different linguistic units in a single complex-valued vector space, e.g. words as particles in quantum states and sentences as mixed systems. A complex-valued network is built to implement this framework for semantic matching. With well-constrained complex-valued components, the network admits interpretations to explicit physical meanings. The proposed complex-valued network for matching (CNM) achieves comparable performances to strong CNN and RNN baselines on two benchmarking question answering (QA) datasets

    On Elastic Language Models

    Full text link
    Large-scale pretrained language models have achieved compelling performance in a wide range of language understanding and information retrieval tasks. Knowledge distillation offers an opportunity to compress a large language model to a small one, in order to reach a reasonable latency-performance tradeoff. However, for scenarios where the number of requests (e.g., queries submitted to a search engine) is highly variant, the static tradeoff attained by the compressed language model might not always fit. Once a model is assigned with a static tradeoff, it could be inadequate in that the latency is too high when the number of requests is large or the performance is too low when the number of requests is small. To this end, we propose an elastic language model (ElasticLM) that elastically adjusts the tradeoff according to the request stream. The basic idea is to introduce a compute elasticity to the compressed language model, so that the tradeoff could vary on-the-fly along scalable and controllable compute. Specifically, we impose an elastic structure to enable ElasticLM with compute elasticity and design an elastic optimization to learn ElasticLM under compute elasticity. To serve ElasticLM, we apply an elastic schedule. Considering the specificity of information retrieval, we adapt ElasticLM to dense retrieval and reranking and present ElasticDenser and ElasticRanker respectively. Offline evaluation is conducted on a language understanding benchmark GLUE; and several information retrieval tasks including Natural Question, Trivia QA, and MS MARCO. The results show that ElasticLM along with ElasticDenser and ElasticRanker can perform correctly and competitively compared with an array of static baselines. Furthermore, online simulation with concurrency is also carried out. The results demonstrate that ElasticLM can provide elastic tradeoffs with respect to varying request stream.Comment: 27 pages, 11 figures, 9 table

    Injecting Knowledge into Biomedical Pre-trained Models via Polymorphism and Synonymous Substitution

    Full text link
    Pre-trained language models (PLMs) were considered to be able to store relational knowledge present in the training data. However, some relational knowledge seems to be discarded unsafely in PLMs due to \textbf{report bias}: low-frequency relational knowledge might be underexpressed compared to high-frequency one in PLMs. This gives us a hint that relational knowledge might not be redundant to the stored knowledge of PLMs, but rather be complementary. To additionally inject relational knowledge into PLMs, we propose a simple-yet-effective approach to inject relational knowledge into PLMs, which is inspired by three observations (namely, polymorphism, synonymous substitution, and association). In particular, we switch entities in the training corpus to related entities (either hypernyms/hyponyms/synonyms, or arbitrarily-related concepts). Experimental results show that the proposed approach could not only better capture relational knowledge, but also improve the performance in various biomedical downstream tasks. Our model is available in \url{https://github.com/StevenZHB/BioPLM_InjectingKnowledge}

    IRGAN: A Minimax Game for Unifying Generative and Discriminative Information Retrieval Models

    Get PDF
    This paper provides a unified account of two schools of thinking in information retrieval modelling: the generative retrieval focusing on predicting relevant documents given a query, and the discriminative retrieval focusing on predicting relevancy given a query-document pair. We propose a game theoretical minimax game to iteratively optimise both models. On one hand, the discriminative model, aiming to mine signals from labelled and unlabelled data, provides guidance to train the generative model towards fitting the underlying relevance distribution over documents given the query. On the other hand, the generative model, acting as an attacker to the current discriminative model, generates difficult examples for the discriminative model in an adversarial way by minimising its discrimination objective. With the competition between these two models, we show that the unified framework takes advantage of both schools of thinking: (i) the generative model learns to fit the relevance distribution over documents via the signals from the discriminative model, and (ii) the discriminative model is able to exploit the unlabelled data selected by the generative model to achieve a better estimation for document ranking. Our experimental results have demonstrated significant performance gains as much as 23.96% on Precision@5 and 15.50% on MAP over strong baselines in a variety of applications including web search, item recommendation, and question answering.Comment: 12 pages; appendix adde

    Semantic Hilbert Space for Text Representation Learning

    Full text link
    Capturing the meaning of sentences has long been a challenging task. Current models tend to apply linear combinations of word features to conduct semantic composition for bigger-granularity units e.g. phrases, sentences, and documents. However, the semantic linearity does not always hold in human language. For instance, the meaning of the phrase `ivory tower' can not be deduced by linearly combining the meanings of `ivory' and `tower'. To address this issue, we propose a new framework that models different levels of semantic units (e.g. sememe, word, sentence, and semantic abstraction) on a single \textit{Semantic Hilbert Space}, which naturally admits a non-linear semantic composition by means of a complex-valued vector word representation. An end-to-end neural network~\footnote{https://github.com/wabyking/qnn} is proposed to implement the framework in the text classification task, and evaluation results on six benchmarking text classification datasets demonstrate the effectiveness, robustness and self-explanation power of the proposed model. Furthermore, intuitive case studies are conducted to help end users to understand how the framework works.Comment: accepted in WWW 201

    A Survey on Biomedical Text Summarization with Pre-trained Language Model

    Full text link
    The exponential growth of biomedical texts such as biomedical literature and electronic health records (EHRs), provides a big challenge for clinicians and researchers to access clinical information efficiently. To address the problem, biomedical text summarization has been proposed to support clinical information retrieval and management, aiming at generating concise summaries that distill key information from single or multiple biomedical documents. In recent years, pre-trained language models (PLMs) have been the de facto standard of various natural language processing tasks in the general domain. Most recently, PLMs have been further investigated in the biomedical field and brought new insights into the biomedical text summarization task. In this paper, we systematically summarize recent advances that explore PLMs for biomedical text summarization, to help understand recent progress, challenges, and future directions. We categorize PLMs-based approaches according to how they utilize PLMs and what PLMs they use. We then review available datasets, recent approaches and evaluation metrics of the task. We finally discuss existing challenges and promising future directions. To facilitate the research community, we line up open resources including available datasets, recent approaches, codes, evaluation metrics, and the leaderboard in a public project: https://github.com/KenZLuo/Biomedical-Text-Summarization-Survey/tree/master.Comment: 19 pages, 6 figures, TKDE under revie
    corecore